test: strengthen Coord/Spec/Coord ping-pong handoff regression test#5825
Closed
Copilot wants to merge 15 commits into
Closed
test: strengthen Coord/Spec/Coord ping-pong handoff regression test#5825Copilot wants to merge 15 commits into
Copilot wants to merge 15 commits into
Conversation
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/43f2a9ba-4aac-4491-89ba-8de379eefc2b Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/4bacd1d2-5911-4fde-8cfb-375ef0808563 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/789cdb87-c90d-4681-b8e0-9b6b63762856 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/9241f85a-8657-4979-8d64-0736dc6e7ebb Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/fd187fc2-3ed5-4204-8340-d4b3c391e989 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
lokitoth
May 13, 2026 19:46
View session
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/9a5433df-55bf-402c-aae8-c79a6d22e677 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Copilot
AI
requested review from
Copilot and
lokitoth
and removed request for
Copilot
May 13, 2026 19:49
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/570d306e-62db-44a6-998a-915fc8fe8807 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
HandoffAgentExecutor synthesizes a 'Transferred.' tool-result update for each handoff function call. That update was created without setting ResponseId, so MessageMerger routed it to the global dangling bucket and flushed all such tool results at the very end of the merged AgentResponse, breaking per-step grouping for multi-step handoffs (see #4544). Streaming output already preserved order because updates are yielded directly without going through the merger. Fix: stamp the synthesized update with the same ResponseId as the preceding agent stream updates so it groups with the agent's other messages in MessageMerger. Also adds a regression test that drives the real workflow through WorkflowHostAgent.RunAsync over a 3-agent handoff chain and asserts the per-step message ordering of the merged response.
Copilot
AI
changed the title
.NET: Validate MessageMerger ordering invariants
Fix multi-step handoff message ordering in non-streaming RunAsync (#4544)
May 28, 2026
Copilot
AI
changed the title
Fix multi-step handoff message ordering in non-streaming RunAsync (#4544)
Fix multi-step handoff message ordering in non-streaming RunAsync
May 28, 2026
Copilot
AI
changed the title
Fix multi-step handoff message ordering in non-streaming RunAsync
Investigate HandoffToolMismatchRepro against AI Project chat client
May 28, 2026
…s in coord/spec/coord ping-pong handoff
Copilot
AI
changed the title
Investigate HandoffToolMismatchRepro against AI Project chat client
test: strengthen Coord/Spec/Coord ping-pong handoff regression test
May 28, 2026
Contributor
|
Closing, test transferred in #6140 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
MessageMerger, the internal component that folds streamingAgentResponseUpdateitems into a finalAgentResponse, had an implicit contract with no tests validating its ordering and grouping behavior. This created two issues:Message ordering bug: When updates lacked
CreatedAttimestamps,CompareByDateTimeOffsettreated null timestamps as "greater than" any value, pushing untimestamped messages unpredictably to the end rather than preserving their arrival order. In multi-agent scenarios (handoff, group chat), this caused message reordering that broke conversation coherence.Missing invariant documentation: The merger's guarantees were never written down, and the code contained dead state (
createdTimesHashSet) suggesting abandoned functionality. Future refactors risked silently breaking the contract.Description
This PR fixes the message ordering issue, documents the merger invariants in ADR 0026, and adds comprehensive tests to pin the expected behavior.
Bug fix in
MessageMerger.CompareByDateTimeOffset:CreatedAtis null for either message, or both timestamps are equal, the comparer now falls back to the original insertion index, preserving arrival orderADR 0026 establishes three invariants:
ResponseIdper turn — Hosting executors must assign aResponseIdif the agent doesn't provide one; updates withResponseId == nullare "dangling" and appended at the endCreatedAt, their relative order in the merged output matches arrival orderResponseIdgrouping — Messages from eachResponseIdappear as a contiguous block (no interleaving), enabling per-agent grouping in multi-agent scenariosCleanup:
createdTimesHashSet that was populated but never consumedTest coverage added in
MessageMergerTests:ResponseIdgrouping for interleaved multi-agent streamsResponseIdgrouping with distinct response IDsFinishReasonpropagationContribution Checklist